Tweet classification using Semantic Word-Embedding with Logistic Regression

نویسندگان

Muhammad Rafi

Saeed Ahmed

Fawwad Ahmed

Fawzan Ahmed

چکیده

The paper presents a text classification approach for classifying tweets into two classes: availability/ need, based on the content of the tweets. The approach uses a language model for classification based on word-embedding of fixed length to get the semantic relationship among words. The approach uses logistic regression for actual classification. The logistic regression measures the relationship between the categorical dependent variable (tweet label) and a fixed length words embedding of the tweetcontent(words), by estimating the probabilities of tweets produced by embedding words. The regression function is estimated by maximum likelihood estimation of composition of tweets by these embedding words. The approach produced 84% accurate classification for the two classes on the training set provided for shared task on "Information Retrieval from Microblogs during Disasters (IRMiDis)". as a part of, The 9th meeting of Forum for Information Retrieval Evaluation (FIRE 2017).

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Data Sets: Word Embeddings Learned from Tweets and General Data

A word embedding is a low-dimensional, dense and realvalued vector representation of a word. Word embeddings have been used in many NLP tasks. They are usually generated from a large text corpus. The embedding of a word captures both its syntactic and semantic aspects. Tweets are short, noisy and have unique lexical and semantic features that are different from other types of text. Therefore, i...

متن کامل

deepCybErNet at EmoInt-2017: Deep Emotion Intensities in Tweets

This working note presents the methodology used in deepCybErNet submission to the shared task on Emotion Intensities in Tweets (EmoInt) WASSA-2017. The goal of the task is to predict a real valued score in the range [0-1] for a particular tweet with an emotion type. To do this, we used Bag-of-Words and embedding based on recurrent network architecture. We have developed two systems and experime...

متن کامل

A Joint Semantic Vector Representation Model for Text Clustering and Classification

Text clustering and classification are two main tasks of text mining. Feature selection plays the key role in the quality of the clustering and classification results. Although word-based features such as term frequency-inverse document frequency (TF-IDF) vectors have been widely used in different applications, their shortcoming in capturing semantic concepts of text motivated researches to use...

متن کامل

MITRE: Seven Systems for Semantic Similarity in Tweets

This paper describes MITRE’s participation in the Paraphrase and Semantic Similarity in Twitter task (SemEval-2015 Task 1). This effort placed first in Semantic Similarity and second in Paraphrase Identification with scores of Pearson’s r of 61.9%, F1 of 66.7%, and maxF1 of 72.4%. We detail the approaches we explored including mixtures of string matching metrics, alignments using tweet-specific...

متن کامل

Improving Twitter Sentiment Classification via Multi-Level Sentiment-Enriched Word Embeddings

Most of existing work learn sentiment-specific word representation for improving Twitter sentiment classification, which encoded both n-gram and distant supervised tweet sentiment information in learning process. They assume all words within a tweet have the same sentiment polarity as the whole tweet, which ignores the word its own sentiment polarity. To address this problem, we propose to lear...

متن کامل

ذخیره در منابع من

ذخیره در منابع من قبلا به منابع من ذحیره شده

{@ msg_add @}

با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره شماره

صفحات -

تاریخ انتشار 2017

Tweet classification using Semantic Word-Embedding with Logistic Regression

نویسندگان

چکیده

منابع مشابه

Data Sets: Word Embeddings Learned from Tweets and General Data

deepCybErNet at EmoInt-2017: Deep Emotion Intensities in Tweets

A Joint Semantic Vector Representation Model for Text Clustering and Classification

MITRE: Seven Systems for Semantic Similarity in Tweets

Improving Twitter Sentiment Classification via Multi-Level Sentiment-Enriched Word Embeddings

عنوان ژورنال:

اشتراک گذاری